author: Joseph B. Rickert date: September 18, 2019 autosize: true
Center for Strategic and Budgetary Assessments
Implications of Data Science as a Resource
Workshop 3
Privately held company (Boston)
56 % of resources devoted to building open source and free software for R
Professional products: RStudio Server Pro, Connect and Package Manager
Mission
To provide open source and enterprise-ready professional software for the R statistical computing environment
These tools further the cause of equipping everyone, regardless of means, to participate in a global economy that increasingly rewards data literacy.
left: 45%
Non Profit Membership Corporation - Organized under the Linux Foundation - Governed by a Board of Directors - Technical committee (ISC) funds projects and oversees work - More that $1M awarded so far
Mission
Photo by Alex Holyoake
Open source software is software with source code that anyone can inspect, modify, and enhance
Open source projects, products, or initiatives embrace and celebrate principles of open exchange, collaborative participation, rapid prototyping, transparency, meritocracy, and community-oriented development
Source: Linux Foundation
For more open source information TODO
Open Source has reached “critical mass”: there is widespread interoperability among conceptually related tools: e.g. you can use R or Python (or R and Python) to build deep learning TensorFlow models, run them on Apache Spark clusters, and manage them with Dockercontainers and Kubernetes.
Continuous Validation: The most important and useful open source projects are continuously monitored and tested by thousands of experts worldwide.
left: 40%
Some History - 1995: R released as open-source
- 1997: R Core Group formed
- 1997: CRAN starts with 12 pkgs
- 2000: R 1.0.0 released
- 2001: Bioconductor Project
- 2003: R Foundation formed
- 2004: First useR! conf - 2009: NY Times article on R
- 2015: The R Consortium
- 2019: CRAN near 15K pkgs
left: 60%
John Chambers’ famous diagram from May 1976 indicates the intention to design a software interface to call an arbitrary FORTRAN subroutine, ABC, by wrapping it in some simplified calling syntax: XABC( ).
The main idea was to bring the best computational facilities to the people doing the analysis. As John phrased it: “combine serious computational challenges with convenience”
“It was always understood that R is meant to build on a base of computational tools. R relies on the ability of functions to communicate with, and exchange objects with other software.”
John Chambers, Extending R, CRC Press (2016)
R is an interpreted scripting language - Base R has a relatively small footprint - The majority of growth and innovation comes from contributed packages: libraries of functions. - Features such as non-standard evaluation make it a good choice for “Design Specific Languages”
R is a functional, object based language - Everything that exists in R is an object - Everything that happens in R is a function call - Interfaces to other software are part of R
NOAA-Polar-Ice.nb.html
left: 50%
Image by Jingwen Zheng
The caret package (short for Classification And REgression Training) is a set of functions that attempt to streamline the process for creating predictive models.
Package Level - Test Programs are available for inspection and use - E.g. Almost 10K lines of test code for survival package
Industry/company Critical Collections - E.g. Pharmaceutical Industry R Validation Hub
https://www.tensorflow.org ***
See the Google Brain paper from Abadi et al. (2017) for the details.
TensorFlow graph represent mathematical operationsleft: 60%
R Markdown
Reproducibly integrate code and text
Use notebooks to weave together narrative text and code from several computer languages (including R, Python, and SQL)to produce formatted output for several document types (HTML, pdf, etc) using multiple languages.
left: 40%
Connect to Spark Clusters - Use as backend to dplyr - Filter and aggregate Spark data sets - Use Spark’s MLlib
Kubernetes and Docker Pipeline
Containers: - Package up everything needed to run App - Reliably deploy Apps on multiple platforms - Distribute Apps across clusters
Within teams and between departments
For many more see: - The Shiny Gallery - The Shiny User Showcase - showmeshiny
joseph.rickert@rstudio.com @RStudioJoe
This presentation available at: https://github.com/joseph-rickert/OSD-Sept-2019